-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[v1] Refactor KVCacheConfig #14079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1] Refactor KVCacheConfig #14079
Conversation
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments. In general LGTM!
vllm/v1/kv_cache_interface.py
Outdated
@@ -87,6 +84,18 @@ class KVCacheTensor: | |||
size: int # The size of KV cache Tensor in bytes | |||
|
|||
|
|||
@dataclass | |||
class VirtualLayer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we call this VirtualLayer
? I believe woosuk and you had some discussions among this. The issue of VirtualLayer
is that you can't tell that it's related to KV cache and KV cache grouping. To me I feel like maybe some names like "GroupedLayerKV" will be more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing in #hybrid-mem channel of slack.
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhuohan123 Updated the PR based on your review. More discussion on the name of VirtualLayer
is needed.
vllm/v1/kv_cache_interface.py
Outdated
@@ -87,6 +84,18 @@ class KVCacheTensor: | |||
size: int # The size of KV cache Tensor in bytes | |||
|
|||
|
|||
@dataclass | |||
class VirtualLayer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing in #hybrid-mem channel of slack.
Signed-off-by: Chen Zhang <[email protected]>
vllm/v1/core/kv_cache_utils.py
Outdated
raise NotImplementedError | ||
|
||
|
||
def make_kv_cache_configs_consistent(kv_cache_configs: list[KVCacheConfig]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks more like unify_kv_cache_configs
?
vllm/v1/core/kv_cache_utils.py
Outdated
# Change the num_blocks of each rank to the smallest among all ranks. We | ||
# do not need to shrink the tensor size because it is valid to only use the | ||
# first `num_blocks` blocks of the tensor. | ||
num_blocks = min(kv_cache_config.num_blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
num_blocks = min(kv_cache_config.num_blocks | |
min_num_blocks = min(kv_cache_config.num_blocks |
vllm/v1/worker/gpu_model_runner.py
Outdated
dtype=dtype, | ||
device=self.device) | ||
else: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
@zhuohan123 @comaniac Updated the PR based on your comments. Can you review it again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Approve to unblock. Left to @WoosukKwon and @zhuohan123
Please fix the merge conflict |
Signed-off-by: Chen Zhang <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Mu Huai <[email protected]>
This PR makes the following changes to KVCacheConfig
KVCacheSpec
from the spec of the model to the spec of one layerKVCacheConfig
class:2. save the kv_cache_spec for each kv cache group instead of each layer
make_kv_cache_configs_consistent
)